202 research outputs found

    Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

    Full text link
    A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise. This assumption is limiting; an agent may encounter settings that dramatically alter the impact of actions: a move ahead action on a wet floor may send the agent twice as far as it expects and using the same action with a broken wheel might transform the expected translation into a rotation. Instead of relying that the impact of an action stably reflects its pre-defined semantic meaning, we propose to model the impact of actions on-the-fly using latent embeddings. By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy (AAP). We evaluate our AAP on two challenging visual navigation tasks in the AI2-THOR and Habitat environments and show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space. Moreover, we observe significant improvement in robustness against these actions when evaluating in real-world scenarios.Comment: 21 pages, 17 figures, ICLR 202

    The impact of giant jellyfish Nemopilema nomurai blooms on plankton communities in a temperate marginal sea

    Get PDF
    Abstract(#br)This study focused on the bloom-developing process of the giant jellyfish, Nemopilema nomurai , on phytoplankton and microzooplankton communities. Two repeated field observations on the jellyfish bloom were conducted in June 2012 and 2014 in the southern Yellow Sea where blooms of N . nomurai were frequently observed. We demonstrated that the bloom was made up of two stages, namely the developing stage and the mature stage. Total chlorophyll a increased and the concentrations of inorganic nutrients decreased during the developing stage, while both concentrations maintained stable and at low levels during the mature stage. Our analysis revealed that phosphate excreted by growing N . nomurai promoted the growth of phytoplankton at the developing stage. At the mature stage, size compositions of microzooplankton were altered and tended to be smaller via a top-down process, while phytoplankton compositions, affected mainly through a bottom-up process, shifted to be less diatoms and cryptophytes but more dinoflagellates

    Poet: Product-oriented Video Captioner for E-commerce

    Full text link
    In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting. Traditional video captioning methods, which focus on routinely describing what exists and happens in a video, are not amenable for product-oriented video captioning. To address this problem, we propose a product-oriented video captioner framework, abbreviated as Poet. Poet firstly represents the videos as product-oriented spatial-temporal graphs. Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics. The knowledge leveraging module in Poet differs from the traditional design by performing knowledge filtering and dynamic memory modeling. We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity. Experiments are performed on two product-oriented video captioning datasets, buyer-generated fashion video dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from Mobile Taobao. We will release the desensitized datasets to promote further investigations on both video captioning and general video analysis problems.Comment: 10 pages, 3 figures, to appear in ACM MM 2020 proceeding

    Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning

    Full text link
    Traffic accident anticipation aims to predict accidents from dashcam videos as early as possible, which is critical to safety-guaranteed self-driving systems. With cluttered traffic scenes and limited visual cues, it is of great challenge to predict how long there will be an accident from early observed frames. Most existing approaches are developed to learn features of accident-relevant agents for accident anticipation, while ignoring the features of their spatial and temporal relations. Besides, current deterministic deep neural networks could be overconfident in false predictions, leading to high risk of traffic accidents caused by self-driving systems. In this paper, we propose an uncertainty-based accident anticipation model with spatio-temporal relational learning. It sequentially predicts the probability of traffic accident occurrence with dashcam videos. Specifically, we propose to take advantage of graph convolution and recurrent networks for relational feature learning, and leverage Bayesian neural networks to address the intrinsic variability of latent relational representations. The derived uncertainty-based ranking loss is found to significantly boost model performance by improving the quality of relational features. In addition, we collect a new Car Crash Dataset (CCD) for traffic accident anticipation which contains environmental attributes and accident reasons annotations. Experimental results on both public and the newly-compiled datasets show state-of-the-art performance of our model. Our code and CCD dataset are available at https://github.com/Cogito2012/UString.Comment: Accepted by ACM MM 202
    • …
    corecore